首页> 外文OA文献 >Comparing Human-Centric and Robot-Centric Sampling for Robot Deep Learning from Demonstrations
【2h】

Comparing Human-Centric and Robot-Centric Sampling for Robot Deep Learning from Demonstrations

机译:机器人深度比较以人为中心和以机器人为中心的采样   从示范中学习

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Motivated by recent advances in Deep Learning for robot control, this paperconsiders two learning algorithms in terms of how they acquire demonstrations."Human-Centric" (HC) sampling is the standard supervised learning algorithm,where a human supervisor demonstrates the task by teleoperating the robot toprovide trajectories consisting of state-control pairs. "Robot-Centric" (RC)sampling is an increasingly popular alternative used in algorithms such asDAgger, where a human supervisor observes the robot executing a learned policyand provides corrective control labels for each state visited. RC sampling canbe challenging for human supervisors and prone to mislabeling. RC sampling canalso induce error in policy performance because it repeatedly visits areas ofthe state space that are harder to learn. Although policies learned with RCsampling can be superior to HC sampling for standard learning models such aslinear SVMs, policies learned with HC sampling may be comparable withhighly-expressive learning models such as deep learning and hyper-parametricdecision trees, which have little model error. We compare HC and RC using agrid world and a physical robot singulation task, where in the latter the inputis a binary image of a connected set of objects on a planar worksurface and thepolicy generates a motion of the gripper to separate one object from the rest.We observe in simulation that for linear SVMs, policies learned with RCoutperformed those learned with HC but that with deep models this advantagedisappears. We also find that with RC, the corrective control labels providedby humans can be highly inconsistent. We prove there exists a class of exampleswhere in the limit, HC is guaranteed to converge to an optimal policy while RCmay fail to converge.
机译:出于深度学习在机器人控制方面的最新进展的推动,本文从如何获取演示的角度考虑了两种学习算法。“以人为中心”(HC)采样是标准的监督学习算法,其中,人类主管通过远程操作来演示任务。机器人提供由状态控制对组成的轨迹。 “以机器人为中心”(RC)采样是一种越来越流行的替代方法,用于诸如DAgger之类的算法中,在该算法中,人类监督员观察机器人执行的学习策略并为访问的每个状态提供纠正性控制标签。 RC采样对人类主管人员可能具有挑战性,并且容易贴错标签。 RC采样还会反复访问状态空间中较难学习的区域,因此也会导致策略执行错误。尽管对于标准学习模型(例如线性SVM),通过RC采样学习的策略可能优于HC采样,但是通过HC采样学习的策略可能与诸如深度学习和超参数决策树之类的高表达学习模型相当,后者的模型误差很小。我们比较了使用农业世界和物理机器人分割任务的HC和RC,在后者中,输入是平面工作表面上一组连接的对象的二进制图像,并且该策略生成抓手的运动以将一个对象与其余对象分离。我们在仿真中观察到,对于线性SVM,使用RC学习的策略要比使用HC学习的策略好,但是使用深层模型时,这种优势就会消失。我们还发现,使用RC,人类提供的纠正控制标签可能会高度不一致。我们证明存在一类示例,其中在一定范围内,可以确保HC收敛到最优策略,而RC可能无法收敛。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号